Tesfaldet M, Brubaker M A, Derpanis K G. Two-stream convolutional networks for dynamic texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6703-6712.
1. Overview
1.1. Motivation
Two-stream hypothesis model the human visual cortex in terms of two pathways
- ventral stream. involved with object recognition
- dorsal stream. involved with motion processing
In this paper, it proposed two-stream model for dynamic texture synthesis
- object recognition (appearance). encapsulate the per-frame appearance, pre-trained for object recognition
- optical flow prediction (dynamic). model dynamics, pre-trained for optical flow prediction
- combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures
- first work to demonstrate this form for style transfer
1.2. Related Work
- Two General Approaches
- non-parametric sampling
- statistical parametric model
- Gram Matrix
- capture the style information, ignore the spatial location
- [b, c, h, w]→ [b, c, hw] & [b, hw, c]→ [b, c, c]
1.3. Future Work
- extent the idea of a factorized representation into feed-forward generative networks
2. Method
Synthesizing a dynamic texture is formulated as an optimization problem with the objective of matching the activation statistics.
2.1. Appearance Stream
- N_l. the number of filter
- M_l. the number of spatial location
- t. time t
- Average over the target frames (as groud-truth).
- T. the number of target frames
- k. spatial location
- i, j. the index of filter
each single frame to be synthesised (as prediction).
The Loss Function
L_{app}. the number of layers used to compute Gram Matrices
- T_{out}. the number of frames being generated in the output
2.2. Dynamic Stream
input. a pair of consecutive greyscale images
T-1. T frames group into (T-1) pairs
- The Loss Function
2.3. Overall
- memory increases as the frames grows
- separate the sequence into sub-sequence
initialize the first frame of a sub-sequence as the last frame from the previous sub-sequence and keep it fixed.
3. Experiments
3.1. w/o Dynamic Stream
3.2. Loss of Flow Decode Layer vs Concat Layer
- concatenation layer activation is far more effective than the flow decode layer
3.3. Failure Example
- fail to capture spatially-inconsistent dynamics
- fail to capture textures with spatially-variant appearance